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The Uniqueness of Unique Identifiers 


Status of this Memo 


This memo provides information for the Internet community. It does 
not specify an Internet standard. Distribution of this memo is 
unlimited. 

Abstract 


This RFC provides information that may be useful when selecting a 
method to use for assigning unique identifiers to people. 


1. The Issue 


Computer systems require a way to identify the people associated with 
them. These identifiers have been called "user names" or "account 
names." The identifers are typically short, alphanumeric strings. 

In general, these identifiers must be unique. 


The uniqueness is usually achieved in one of three ways: 


1) The identifiers are assigned in a unique manner without using 
information associated with the individual. Example identifiers are: 


ax54tv 
cs00034 


This method was often used by large timesharing systems. While it 
achieved the uniqueness property, there was no way of guessing the 
identifier without knowing it through other means. 


2) The identifiers are assigned in a unique manner where the bulk of 
the identifier is algorithmically derived from the individual’s name. 
Example identifers are: 


Craig.A.Finseth-1 
Finsethl 

caf-1 

fins0001 


3) The identifiers are in general not assigned in a unique manner: 
the identifier is algorithmically derived from the individual's name 
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and duplicates are handled in an ad-hoc manner. Example identifiers 
are: 


Craig.Finseth 
caf 


Now that we have widespread electronic mail, an important feature of 
an identifier system is the ability to predict the identifier based 
on other information associated with the individual. This other 
information is typically the person’s name. 


Methods two and three make such predictions possible, especially if 
you have one example mapping from a person’s name to the identifier. 
Method two relies on using some or all of the name and 
algorithmically varying it to ensure uniqueness (for example, by 


appending an integer). Method three relies on using some or all of 
the name and selects an alternate identifier in the case of a 
duplication. 


For both methods, it is important to minimize the need for making the 
adjustments required to ensure uniqueness (i.e., an integer that is 
not 1 or an alternate identifier). The probability that an 
adjustment will be required depends on the format of the identifer 
and the size of the organization. 


2. Identifier Formats 


There are a number of popular identifier formats. This section will 
list some of them and supply both typical and maximum values for the 
number of possible identifiers. A "typical" value is the number that 
you are likely to run into in real life. A "maximum" value is the 
largest number of possible (without getting extreme about it) values. 
All ranges are expressed as a number of bits. 


2.1 Initials 


There are three popular formats based on initials: those with one, 


two, or three letters. (The number of people with more than three 
initials is assumed to be small.) Values: 

format typical maximum 

T 4 5 

LE 8 10 

III 12 15 
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You can also think of these as first, middle, and last initials: 


aD 4 2 
FL 8 10 
FML 12 15 


2.2 Names 


Again, there are three popular formats based on using names: those 
with the first name, last name, and both first and last names. 


Values: 
format typical maximum 
First 8 14 
Last 9 13 
First Last T 27 


2.3 Combinations 


I have seen these combinations in use ("F" is first initial, "M" is 
middle initial, and "L" is last initial): 


format typical maximum 
F Last 13 18 
F M Last 17 23 
First L 12 19 
First M Last 21 32 


2.4 Complete List 


Here are all possible combinations of nothing, initial, and full name 
for first, middle, and last. The number of Middle names is assumed 
to be the same as the number of First names. Values: 


format typical maximum 
ENSE 0 0 
exor 4 5 
_ Last 9 13 
tM 4 5 
ML 5 10 
_ M Last 13 18 
. Middle _ 8 14 
_ Middle L 12 19 
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Middle Last 


Hj AY Ay 


Hj Hj Ay 


First 
First 
First 


First 
First 
First 


First 
First 
First 


3. Probabilities 


As can be seen, 


Hj nj Hj 
Im 


M 
ML 
M Last 


Middle 
Middle L 
Middle Last 


M 

ML 
M Last 
Middle 


Middle 
Middle 


of Duplicates 


L 
Last 


17 


27 
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the information content in these identifiers in no 


case exceeds 40 bits and the typical information content never 


exceeds 26 bits. 


range. 


Duplicates are thus not only possible but likely. 


The content of most of them is in the 8 to 20 bit 


The method used to compute the probability of duplicates is the same 
as that of the well-known "birthday" problem. 


items, 


N N-1 N-2 
- X —- X —- X 


N 


N 


N 


For a universe of N 


the probability of duplicates in X members is expressed by: 


A program to compute this function for selected values of N is given 


in the appendix, 


The 


"mis" 


column is the number of items 


organization of that 
Similarly for 2$, Sy 


Finseth 


(universe) 
10%, and 20%. 


as is its complete output. 


(people) 


before an 


size has a 1% chance of a duplicate. 


[Page 4] 


RFC 1439 


For example, 


form. 


information. 


Uniqueness of Unique Identifiers 


bits universe 
6 64 
7 128 
8 256 
9 512 
10 1,024 
11 2,048 
12 4,096 
13 8,192 
14 16,384 
15 32,768 
16 65,536 
1:7 131,072 
18 262,144 
19 524,288 
20 1,048,576 
2T 2,097,152 
22 4,194,304 
23 8,388,608 
24 16,771,216 
25 33,554,432 
26 67,108,864 
27 134,217,728 
28 268,435,456 
29 536,870,912 
30 1,073,741,824 
31 2,147,483,648 


This form has 1 


dy 


7 bits 


131,072 


mn 
oe 


74 
104 
146 
206 
291 
412 
582 
822 
1162 
1644 
2324 
3286 
4647 
6571 


(typical) 
The relevant line is: 


52 


For an organization with 100 people, 


would be between 2$ 
had 1,000 people, 


greater than 20$. 


Appendix: 


Reuse of Identifiers and Privacy Issues 


and 5$ 


N 
oe 


74 
104 
147 
207 
292 
413 
583 
824 
1165 
1648 
2330 
3294 
4659 
6588 
9316 


74 


ol 
oe 


83 
117 
165 
233 
329 
465 
657 
929 
1313 
1856 
2625 
3712 
5249 
7422 
10496 
14844 


117 
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10% 20% 
5 6 
6 8 
8 12 
Tl 16 
16 22 
22 31 
30 44 
43 61 
60 86 
84 122 
118 172 
167 243 
236 343 
333 485 
471 685 
666 968 
941 1369 
1330 1936 
1881 2737 
2660 3871 
3761 5474 
5319 7740 
7522 10946 
10637 15480 
15043 21891 
21273 30959 


(maximum) 


167 


assume an organization were to select the "First Last" 
and 27 bits 


of 


243 


the probability of a duplicate 
If the organization 


(probably around 4%). 
the probability of a duplicate would be much 


Let’s say that an organization were to select the format: 


First.M.Last-£ 


as my own organization has. 


do: 


Finseth 


Is the -# required, 


or can one simply 
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Craig.A.Finseth 
for the first one and 
Craig.A.Finseth-2 


(or -1) for the second? The answer is "no," although for non-obvious 
reasons. 


Assume that the organization has made this selection and a third 
party wants to send e-mail to Craig.A.Finseth. Because of the 
Electronic Communications Privacy Act of 1987, an organization must 
treat electronic mail with care. In this case, there is no way for 
the third party user to reliably know that sending to Craig.A.Finseth 
is (may be) the wrong party. On the other hand, if the -# suffix is 
always present and attempts to send mail to the non-suffix form are 
rejected, the third party user will realize that they must have the 
suffix in order to have a unique identifier. 


For similar reasons, identifiers in this form should not be re-used 
in the life of the mail system. 


Appendix: Perl Program to Compute Probabilities 
#!/usr/local/bin/perl 


for Sbits (6..31) { 
&Compute ($bits); 


sub Compute { 
Sbits = $_[0]; 
Snum = 1 << Sbits; 
Scnt = Snum; 


print "bits Sbitsnumber $num:0; 


for ($prob = 1; $prob > 0.99; ) { 
Sprob *= Scnt / $num; 
$cnt--; 


} 
print "", $num - Scent, "Sprob0; 


for (; Sprob > 0.98; ) { 
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bits 


bits 


Finseth 


6 


Uniqueness of Unique Identifiers 


Sprob *= Scnt / $num; 
$cnt--; 


} 


print Wate 


$num - Scnt, "S$prob0; 


for (; Sprob > 0.95; ) { 
$prob *- $cnt / $num; 
$cnt--; 


} 


print wan 


$num - Scnt, "Sprob0; 


for (; Sprob > 0.90; ) { 
$prob *- $cnt / $num; 
$cnt--; 


} 


print wa 


$num - Scnt, "Sprob0; 


for (; Sprob > 0.80; ) { 
$prob *- $cnt / $num; 
$cnt--; 


} 


print "T", 


print "0; 
} 


$num - Scnt, "Sprob0; 


Perl Program Output 


number 64: 

- 984375 

- 95361328125 
.90891265869140625 
.85210561752319335938 
.78553486615419387817 


oO OU B® WN 
OOooo0o0 


number 128: 


oon UW CO 
OoOooo0o 


0. 
.9766845703125 

.92398747801780700684 
.88789421715773642063 
.79999355674331695809 


9766845703125 


number 256: 


3 0. 
4 0 
6 0 
8 0 
12 0 


988311767578125 


.97672998905181884766 
.94268989971169503406 
.89542306910786462204 
.76969425214152431547 
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bits 


bits 


bits 


bits 


bits 


bits 


bits 
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9 


10 


11 


12 


13 


14 


ES 


number 
4 

6 

8 

T3 

16 


number 
6 

7 

Ti 

16 

22 


number 
7 

10 

15 

22 

Sa 


number 
10 
14 
21 
30 
44 


number 
14 
19 
30 
43 
61 


number 
19 
23 
42 
60 
86 


number 
21 

37 

59 

84 

122 
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.98832316696643829346 
.97102570187075798458 
.946526327751096643648 
-89748056780293572476 
.78916761796439427457 


1024: 


0. 
.97965839745873206645 
.94753115178840541244 
-88888866335604777014 
-79677613655632184564 


OOO 


98543241551841020964 


2048: 


0. 
.97823367137821537476 
.94990722378677450166 
.89298119682681720288 
.79597589885472519455 


OOoo0o0 


98978773152834598203 


4096: 


0. 
.978004267773009718762 
.94994111694430838355 
.89901365764115603874 
.79312138620093930452 


OOoo0o0 


98906539062491305447 


8192: 


0. 
.97932692503837115439 
.94822407309193512681 
.89545741661906652631 
.7993625840767998314 


OoOoo0o0 


98894703242829806733 


16384: 


0. 
-97879319536756481668 
.94876352395820107155 
.89748107890372830209 
.79973683158771624591 


OoOooo0 


98961337517641645434 


32768: 


0. 
.97987304880641035165 
.94909471808051404373 
.89899774209805793923 
.79809378598190949816 


Oooo0 


98934263776790121181 
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bits 


bits 


bits 


bits 


bits 


bits 
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16 


17 


18 


19 


20 


21 


22 


number 
37 

52 

83 

118 
172 


number 
52 

74 

117 
167 
243 


number 
74 

104 
165 
236 
343 


number 
104 
147 
233 
333 
485 


number 
146 
207 
329 
471 
685 


number 
206 
292 
465 
666 
968 


number 
291 
413 
657 
941 
1369 
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65536: 


.98988724065590050216 
.97996496661944154649 
.94937874420413270737 
.89996948010355670711 
.79884228150816105618 


131072: 


0. 
.97960010416289267088 
.94952974978505377823 
.89960828942716541956 
.79894309171178368167 


O OO 


98993311138884398925 


262144: 


0. 
.979777315557223210174 
.94968621078621640041 
.8995926348279144058 
.79944227937165953994 


OO QOO 


98974844864797828503 


524288: 


0. 
.97973841652874515962 
.94974719445364064185 
.89991342619657743729 
.79936749144148444568 


OOoo0o0 


98983557888923057178 


1048576: 


0. 
.97987072919607220989 
.94983990872655321702 
.89980857451706741656 
.799774215234216872172 


OoOoo0o0 


98995567500195758015 


2097152: 


0. 
.97994400939715686771 
.94985589918092261374 
.89978055267663470396 
.79994886751736571373 


Oooo0o 


98998177463778547214 


4194304: 


0. 
.97991951242142538714 
.94991674892578203959 
.89991652739633254399 
.79989205747440361716 


Oooo0 


98999013137747737812 
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bits 


bits 


bits 


bits 


bits 


bits 
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23 


24 


25 


26 


27 


28 


29 


number 
412 
583 
929 
1330 
1936 


number 
582 
824 
1313 
1881 
2737 


number 
822 
1165 
1856 
2660 
3871 


number 
1162 
1648 
2625 
3761 
5474 


number 
1644 
2330 
3712 
5319 
7740 


number 
2324 
3294 
5249 
7522 
10946 


number 
3286 
4659 
7422 
10637 
15480 
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8388608: 


.98995762604049764022 
.97997846530691334888 
.94991024716640248826 
.89999961063320443877 
.79987028265451087794 


16777216: 


0. 
.97999203469417239809 
.94995516684099989835 
.89997049960675035152 
.79996700222056416063 


O O-O 


98997307486745211857 


33554432: 


0. 
.9799956928177964155 
.9499899669674316538 
.8999664414095410736 
.79992328289672998132 


OOoo0o0 


98999408609360783906 


67108864: 


0. 
.9799801637652703068 

.94997437525354821997 
.89999748465616635773 
.79993922903192515861 


OOoo0o0 


98999884535478044345 


134217728: 


0. 
.97998730103356856969 
.94997727934463771504 
.89998552434244594167 
.79999591580103557309 


OOoo0o0 


9899880636014986024 


268435456: 


0. 
.97999828329325222587 
.94998397932368705554 
.89998576049206902017 
.799990587777500076101 


Oooo0 


98999458855588851058 


536870912: 


0. 
.97999160965267329004 
.94999720388831232487 
.89999506567702891591 
.7999860979665908145 


OOGO 


98999717306002099626 
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bits 30 number 1073741824: 


4647 0.98999674474047760775 
6588 0.97999531736215383937 
10496 0.94999806770951356061 
15043 0.89999250738244507275 
21891 0.79999995570982085358 


bits 31 number 2147483648: 


6571 0.98999869761078929109 
9316 0.97999801528523688976 
14844 0.94999403283519279206 
21273 0.89999983631135749285 
30959 0.79999272222201334159 
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